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ABSTRACT 



A system detects a face within an image by receiving the 
image which includes a plurality of pixels, where a plurality 
of the pixels of the image is represented by respective groups 
of at least three values. The image is filtered by transforming 
a plurality of the respective groups of the at least three 
values to respective groups of less than three values, where 
the respective groups of the less than three values has less 
dependency on brightness than the respective groups of the 
at least three values. Regions of the image representative of 
skin-tones are determined based on the filtering. A first 
distribution of the regions of the image representative of the 
skin-tones in a first direction is calculated. A second distri- 
bution of the regions of the image representative of the 
skin-tones in a second direction is calculated, where the first 
direction and the second direction are different. The face 
within the image is located based on the first distribution and 
the second distribution. The estimated face location may 
also be used for tracking the face between frames of a video. 

72 Claims, 5 Drawing Sheets 



FRAME 
GRABBER 



COMMUNICATION 
MODULE 



3fr 



} IMAGE 



-10 
-12 



TRANSFORMATION 



-14 



BINARY 
IMAGE 



FACE 
LOCATOR 



04/30/2003, EAST version: 1.03.0002 



US 6,332,033 Bl 

Page 2 



OTHER PUBLICATIONS 

Gonzalez, "Digital Image Processing," Color Image Pro- 
cessing (pp. 225-235) Jun. 1992. 

Wong, et al, "A Mobile robot That Recognizes People." 
IEEE, (pp. 346-353) May 11, 1995. 



Hitachi Europe Limited, European Patent Application No. 
94650031.1, Filed Nov. 22, 1994 for "An image Processing 
method and apparatus.". 

* cited by examiner 



04/30/2003, EAST Version: 1.03.0002 



U.S. Patent Dec. 18, 2001 Sheet 1 of 5 



US 6,332,033 Bl 




O 

LL 



04/30/2003, EAST Version: 1.03.0002 



U.S. Patent Dec. 18,2001 Sheet 2 of 5 US 6,332,033 Bl 



FIG. 2 




04/30/2003, EAST Version: 1.03.0002 



U.S. Patent Dec. 18,2001 Sheet 3 of 5 US 6,332,033 




04/30/2003, EAST Version: 1.03.0002 



U.S. Patent Dec. 18, 2001 Sheet 4 of 5 US 6,332,033 Bl 



u> 

o 

Li. 

oooooooooooooooooooooooooooo 
oooooooooooooooooooooooooooo 

OOOOOOOt-OOOOOOt-OOOOOOOOt-OOOO 
OOOOOOOOOOOOt-OOOOOOOt-OOOOOOO 

T-000000000000000000000000000 

OOOOOt-OOOOOOOOOOOOOOOt-OOOOOO 
i-OOOOOt-OOOOOOOt-0000000000000 
OOOt-000000000000000000000000 
t-Ot-OOOOOOt-OOOOOOOOOOt-OOOOOOO 



OOOOOOOt-OOOt-OOOOOOOt-OOOOOOt-O 
OOOOOOOOOOOOt-t-t-t-t-t-OOOOOOOOOO 
OOOOOOOOOOt-t-t-t-t-t-t-t-OOOOOOOOOO 
t-OOOOOt-Ot-t-t-t-t-Ot-t-t-t-t-t-OOOOOOOO 
Ot-OOOOOt-t-t-t-t-Ot-t-t-Ot-t-t-y-OOOOOOO 

OOOOOOT-T-l-t-l-l-T-T-t-f-l-l-T-T-T-l-OOOOOO 

OOOOOr-T-T-T-T-T-r-r-T--T-r-OT-T-T--T-T-T-00000 
OOOOT-T-T-T-T-OT-r-T-T-T-r-T-T-r-T-T-T-T-T-OOOO 
OOOOOt-t-t-t-OOt-y-y-y-t-t-t-y-t-t-y-y-t-OOOO 

OOOOt"T-T-1-T-Ot-1-T-T-T-^-1-1— Ol-T-ff-T-OOOO 

Ot-Ot-Ot-t-t-t-Ot-t-t-t-t-t-OOOt-y-t-t-t-t-OOO 

OOOOt-^-l-t-T-t-^-OT-^f-T-T-OOl-T-T-T-T-T-OOO 
OOOt-1— t-^-OT-l-T-f-T-T-l— T— OOt-T-^I-^-T-T-'T-T- 

OOt-!-T-T-T-T-T-^!-*-T-0^1-T-O0O*-T-T-*-T-000 

OOOT-OT-T-T-T-T-Or-T-r-i-T-OOOT-Or-T-T-OOOO 
OOOt-t-t-t-t-t-t-t-t-t-t-Ot-t-OOOt-t-t-t-t-OOO 
OOOi-t-t-t-t-t-t-t-y-t-Ot-t-t-OOt-t-t-t-t-t-OOO 

OOOl-T-r-T-r-T-Ol-T~l-t-f-T-^-00^-T-t-T-0^-000 

Ot-OOy-t-t-t-OOOt-t-t-t-t-y-y-y-y-t-t-t-t-t-OOO 
OOOOt-t-t-t-t-Ot-t-t-t-t-t-t-y-t-t-t-t-t-t-OOOO 

OT-OOOt-^-T-T-^-T-^-l-T-T-T-l-l-T-T-l-l-r-T-OOOO 
OOOOO^-T-l-T-T-Y-T-t-^-l-T-Ol-T-l-T-T-t-OOOOO 

t-OOOOOt-t-t-y-t-t-t-Ot-t-y-t-t-t-t-y-OOOOOO 
OOOOOOOy-t-t-t-t-y-t-t-t-t-Ot-t-t-OOOOOt-t- 
OOOOOOOOt-t-t-y-t-t-y-t-y-t-t-y-OOOOOOOO 



t-OOOOOOOOOt-t-t-t-t-*-t-t-0000000000 
OOOOOOOt-OOOOOOt-OOOOOOOOt-OOOO 
OOOOOOOOOOOOt-OOOOOOOt-OOOOOOO 
T-000000000000000000000000000 
OOOOOt-OOOOOOOOOOOOOOOt-OOOOOO 
t-OOOOOt-OOOOOOOt-0000000000000 
OOOt-000000000000000000000000 
t-Ot-OOOOOOt-OOOOOOOOOOt-OOOOOOO 
OOOOOOOt-OOOt-OOOOOOOt-OOOOOOt-O 
OOOOOOOOOOOOOt-Ot-OOOOOt-Ot-OOOO 



04/30/2003, EAST Version: 1.03.0002 



U.S. Patent Dec. 18, 2001 Sheet 5 of 5 



US 6,332,033 Bl 



0010101001 00100000001100000 
000000000(00001 01 000110001 



001 0000001 00000000001 11 
0001 00000( 00000001 1 V 
000000000(000001 1111 



000(0001 ' 



I 0000( 001 



000001 
00001 
01 0000001 

ooooooooodi 

001 000000(1 
0000000001 
01 00000001 
■ 00000001 01 



1 

1 1 

401 1 1 1 
1111 
1111 
11111 
11111 



1 ooooooooi 1 

00001 0001 
1 00000000' 
0000000001 
0000000001 
000000000(11 
01 000000001 



34D 



'0000001 0001 



11111 
11111 
11111 

1 01 1 1 
0001 1 
1 01 1 

11111 




38 



111 



1111111111 



1 01 1 

11111 
11111 

11101 
1 01 1 1 

1111 
1111 



1 11 01' 
11110' 

11111 
11111 
1 1 
1 1 

11110 



1 00001 0000001 1 1 1 1 1 1 1 

1111111 

1 00000001 600001 1 1 01 1 
000000000(0000001 1 1 1 



0001 01 01 0000 
IIOOOOOOOOOOOO 
00000000 
000000600001 0000000 
■OWO OOOOOOOOOOOO 



1/111111 

111111110" 

11111111 

1011111111 
1 1 1 00001 1 1 
1 1 1 1 01 1 1 1 
1 1 01 1 1 1 1 1 1 
1111 1 1 1 1 1 0 



0000000001 I 



0111111111 

1111111111 
1111111111 

1 1 01 1 1 01 

11111 
1111 

01111111111 



(11 



(11 

0001100001 
0000000001 



00 1 0000 1 0( 1 0 1 1 1 1 1 11 1 1 01 1 1 1 1 1 1 1 1 1 H 000 1 0000 1 000 



(11 



111111111 
11111111 
1111111 
1 1 1 



000000000(1 0000000000001 0000000000 00 00 0000000 
01 0000000( 01 00000000001 000000000001 000000000 

0000000001 01 00000000001 000000009000000000000 



^36 



r 



34A 



00000 
000001 000000 
0000001 00 

000000000000 

0001 00000000 
1 00000000000 
1 01 000000000 
1 1 0000001 000 



32 



0' 1 



1 0000000000 

1 1 0001 0001 00 

1 1 0000000000 
1 0000000000 
1 1 0000000000 

000000000000 
001 000000000 



000000001 00000 
00 4 00000 0000000 
00 



00000000000001 1 



34B 



000000000000 0000000 



<- 



34C 



■ 1 1 ■ 1 1 



FIG. 6 



04/30/2003, EAST Version: 1.03.0002 



US 6,332,033 Bl 

1 2 

SYSTEM FOR DETECTING SKIN-TONE ing of Video Teleconferencing Sequences at Low Bit-Rate," 

REGIONS WITHIN AN IMAGE teaches a system for face location detection and tracking. 

The system is particularly designed for video data that 
This application is a continuation of application Ser. No. includes head-and-shoulder sequences of people which are 
09/004,539 filed Jan. 8, 1998, U.S. Pat. No. 6,148,092 Patent 5 modeled as elliptical regions of interest. The system pre- 
date Nov. 14, 2000. sumes that the outline of people's heads are generally 

elliptical and have high temporal correlation from frame to 

BACKGROUND OF THE INVENTION frame. Based on this premise, the system calculates the 

The present invention relates to a system for locating a difference between consecutive frames and thresholds the 

human face within an image, and more particularly to a 10 result to identify regions of significant movement, which are 

system suitable for real-time tracking of a human face in indicated as non-zero. Elliptical non-zero regions are located 

video sequences. aQ d identified as facial regions. Unfortunately, the system 

Numerous systems have been developed for the detection tau ^ ht b y Ele ftheriadis et al. is computationally intensive 

of a target with an input image. In particular, human face and 15 not smtable for reaI " time applications. Moreover, 

detection within an image is of considerable importance. 15 shadows or partial occlusions of the person's face results in 

Numerous devices benefit from automatic determination of re gi™s that are not elliptical and therefore the 

whether an image (or video frame) contains a human face, s y stem ma y fai1 t0 ld entify such re g«>ns as a face. In 

and if so where the human face is in the image. Such devices addition, if the orientation of the person's face is away from 

may be, for example, a video phone or a human computer the camera tnen the resultin S outline °f the person's head 

interface. A human computer interface identifies the location 20 ^ not be eUl P tlcal and therefore the system may fail to 

of a face, if any, identifies the particular face, and under- identify the person's head. Also, if there is substantial 

stands facial expressions and gestures. movement within the background of the image the facial 

Traditionally, face detection has been performed using re g 10a mav t> e obscured, 

correlation template based techniques which compute simi- Ha S er et aL in a P a P er entitled , Real-Time Tracking of 

larity measurements between a fixed target pattern and 25 Image Regions with Changes in Geometry and Illumination, 

multiple candidate image locations. If any of the similarity discloses a face tracking system that analyzes the brightness 

measurements exceed a threshold value then a "match" is of an ima S e a window. The pattern of the brightness 

declared indicating that a face has been detected and its Wlthin the window is used to track the face between frames, 

location thereof. Multiple correlation templates may be ^ s y stem tau § ht b Y Ha S er et al - * sensitive to face 

employed to detect major facial sub-features. A related 30 orienta tion changes and partial occlusions and shadows 

technique is known as "view-based eigen-spaces," and which obscure lhe pattern of the image. The system is 

defines a distance metric based on a parameterizable sub- incapable of initially determining the position of the face(s). 

space of the original image vector space. If the distance What is desired, therefore, is a face tracking system that 

metric is below a threshold value then the system indicates is insensitive to partial occlusions and shadows, insensitive 

that a face has been detected. to face orientation and/or scale changes, insensitive to 

An alternative face detection technique involves using changes in lighting conditions, easy to calibrate, and can 
spatial image invariants which rely on compiling a set of determine the initial position of the face(s). In addition, the 
image invariants particular to facial images. The input image system should be computationally simple so that it is suit- 
is then scanned for positive occurrences of these invariants ^ able for real-time applications, 
at all possible locations to identify human faces. 

Yang e, al. in a paper entitled A Real-Time Face Tracker SUMMARY OF THE INVENTION 

discloses a real-time face tracking system. The system The present invention overcomes the aforementioned 

acquires a red-green-blue (RGB) image and filters it to drawbacks of the prior art by providing a system for detect- 

obtain chromatic colors (r and g) known as "pure" colors, in 45 ing a face within an image that receives the image which 

the absence of brightness. The transformation of red-green- includes a plurality of pixels, where a plurality of the pixels 

blue to chromatic colors is a transformation from a three of the image is represented by respective groups of at least 

dimensional space (RGB) to a two dimensional space (rg). three values. The image is filtered by transforming a plu- 

The distribution of facial colors within the chromatic color rality of the respective groups of the at least three values to 

space is primarily clustered in a small region. Yang et al. 50 respective groups of less than three values, where the 

determined after a detailed analysis of skin-color distribu- respective groups of the less than three values has less 

tions that the skin color of different people under different dependency on brightness than the respective groups of the 

lighting conditions in the chromatic color space have similar at least three values. Regions of the image representative of 

Guassian distributions. To determine whether a particular skin- tones are determined based on the filtering. A first 

red-green-blue pixel maps onto the region of the chromatic 55 distribution of the regions of the image representative of the 

color space indicative of a facial color, Yang et al. teaches skin-tones in a first direction is calculated. A second distri- 

the use of a two-dimensional Guassian model. Based on the bution of the regions of the image representative of the 

results of the two-dimensional Guassian model for each skin- tones in a second direction is calculated, where the first 

pixel within the RGB image, the facial region of the image direction and the second direction are different. The face 

is determined. Unfortunately, the two-dimensional Guassian $ 0 within the image is located based on the first distribution and 

model is computationally intensive and thus unsuitable for the second distribution. 

inexpensive real-time systems. Moreover, the system taught Using a system that determines skin-tone regions based 

by Yang et al. uses a simple tracking mechanism which 0 n a color representation with reduced brightness depen- 

results in the position of the tracked face being susceptible dency together with first and second distributions permits 

to jittering. 65 the f ace ^jong system to be insensitive to partial occlu- 

Eleftheriadis et al,, in a paper entitled "Automatic Face sions and shadows, insensitive to face orientation and/or 

Location Detection and Tracking for Model- Assisted Cod- scale changes, insensitive to changes in lighting conditions, 
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and can determine the initial position of the face(s). In 
addition, the decomposition of the image using first and 
second distributions allows the system to be computation- 
ally simple so that it is suitable for real-time applications. 

In the preferred embodiment the estimated face location 
may also be used for tracking the face between frames of a 
video. For simplicity the face motion may be modeled as a 
piece- wise constant two-dimensional translation within the 
image plane. A linear Kalman filter may be used to predict 
and correct the estimation of the two-dimensional translation 
velocity vector. The estimated (filtered) velocity may then 
also be used to determine the tracked positions of faces. 

The foregoing and other objectives, features, and advan- 
tages of the invention will be more readily understood upon 
consideration of the following detailed description of the 
invention, taken in conjunction with the accompanying 
drawings. 

BRIEF DESCRIPTION OF THE SEVERAL 
VIEWS OF THE DRAWINGS 

FIG. 1 is a block diagram of an exemplary embodiment of 
a face detection and tracking system of the present inven- 
tion. 

FIG. 2 is a graph of the distributions of the skin-colors of 25 
different people in chromatic color space with the grey-scale 
reflecting the magnitude of the color concentration. 

FIG. 3 is a circle centered generally within the center of 
the distribution shown in FIG. 2. 

FIG. 4 is an image with a face. 

FIG. 5 is a binary image of the face of FIG. 4. 

FIG. 6 is a pair of histograms of the binary image of FIG 
5 together with medians and variances for each histogram 



absence of brightness, 
ization process: 

r=R/(R+G+B) 



are generally defined by a normal- 
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DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

Referring to FIG. 1, a face detection and tracking system 
6 includes an image acquisition device 8, such as a still 
camera or a video camera. A frame grabber 9 captures 
individual frames from the acquisition device 8 for face 
detection and tracking. An image processor 11 receives an 
image 10 from the frame grabber 9 with each pixel repre- 
sented by a red value, a green value, and a blue value, 
generally referred to as an RGB image. The image 10 may 
alternatively be represented by other color formats, such as 
for example; cyan, magenta, and yellow; luminance, 
intensity, and chromaticity generally referred to as the YIQ 
color model; hue, saturation, intensity; hue, lightness, satu- 
ration; and hue, value, chroma. However, the RGB format is 
not necessarily the preferred color representation for char- 
acterizing skin-color. In the RGB color space the three 
colors [R, G, B] represent not only the color but also its 
brightness. For example, if the corresponding elements of 
two pixels, [Rl, Gl, Bl] and [R2, G2, B2], are proportional 
(i.e., R1/R2-G1/G2-B1/B2) then they characterize the same 
color albeit at different brightnesses. The human visual 
system adapts to different brightness and various illumina- 
tion sources such that a perception of color constancy is 
maintained within a wide range of environmental lighting 
conditions. Therefore it is desirable to reduce the brightness 
information from the color representation, while preserving 
accurate low dimensional color information. Since bright- 
ness is not important for characterizing skin colors under the 
normal lighting conditions, the image 10 is transformed by 
a transformation 12 (filter) to the chromatic color space. 
Chromatic colors (r, g), known as "pure" colors in the 



35 
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50 



60 



65 



g-G/(R+<j+B) 

The effect of the transformation 12 is to map the three 
dimensional RGB image 10 to a two dimensional rg chro- 
matic color space representation. The color blue is redundant 
after the normalization process because r+g+b=l. Any suit- 
able transformation 12 may be used which results in a color 
space where the dependence on brightness is reduced, espe- 
cially in relation to the RGB color space. 

It has also been found that the distributions of the skin- 
colors of different people are clustered in chromatic color 
space, as shown in FIG. 2. The grey-scale in FIG. 2 reflects 
the magnitude of the color concentration. Although skin 
colors of different people appear to vary over a wide range, 
they differ much less in color than in brightness. In other 
words, the skin-colors of different people are actually quite 
similar, while mainly differing in intensities. 

The two primary purposes of the transformation 12 are to 
(1) facilitate distinguishing skin from other objects of an 
image, and (2) to detect skin tones irrespective of the 
particular color of the person's skin which differs from 
person to person and differs for the same person under 
different lighting conditions. Accordingly, a suitable trans- 
formation 12 facilitates the ability to track the face(s) of an 
image equally well under different lightning conditions even 
for people with different ethnic backgrounds. 

Referring to FIG. 3, the present inventor determined that 
a straightforward characterization of the chromaticity dis- 
tribution of the skin tones may be a circle 20 centered 
generally within the center of the distribution shown in FIG. 
2. Alternatively, any suitable regular or irregular polygonal 
shape (including a circle) may be used, such as a square, a 
pentagon, a hexagon, etc. The use of a polygonal shape 
permits simple calibration of the system by adjusting the 
radius of the polygonal shape. The region encompassed by 
the polygonal shape therefore defines whether or not a 
particular pixel is a skin tone. In addition, it is computa- 
tionally simple to determine whether or not a particular set 
of rg values is within the region defined by the polygonal 
shape. If the rg values are within the polygonal shape, 
otherwise referred to as the skin-tone region, then the 
corresponding pixel of the image 10 is considered to be a 
facial feature, or otherwise having a skin tone. 

Based on whether each pixel of the image 10 is within the 
skin tone region the system generates a binary image 14 
corresponding to the image 10. The binary image 14 has a 
value of 1 for each pixel of the image 10 that is identified as 
a skin tone. In contrast, the binary image 14 has a value of 
0 for each pixel of the image that is not identified as a skin 
tone. It is to be understood that groups of pixels may 
likewise be compared on a group by group basis, instead of 
a pixel by pixel basis, if desired. The result is a binary image 
14 that contains primarily Vs in those portions of the image 
10 that contain skin tones, such as the face, and primary 0*s 
in the remaining portions of the image. It is noted that some 
portions of non-facial regions will have skin tone colors and 
therefore the binary image 14 will include a few l's at 
non-face locations. The opposite is also true, facial regions 
may include pixels that are indicative of non-skin tones and 
will therefore be indicated by 0's. Such regions may include 
beards, moustaches, and hair. For example, the image 10 as 
shown in FIG. 4 may be mapped to the binary image 14 as 
shown in FIG. 5. 
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Alternatively, the representation of the O's and l's may be where x(k) is the true velocity vector to be estimated, z(k) 

reversed, if desired. Moreover, any other suitable represen- is the observed instantaneous velocity vector, w(k), v(k) are 

tation may be used to distinguish those portions that define white noise, and F(k)=I, H(k)oI for piece-wise constant 

skin-tones from those portions that do not define skin tones. motion. The Kalman predictor is: 

Likewise, the results of the transformation 12 may result in 5 

weighted values that are indicative of the likelihood that a x(k+i|k)«F(k)x(k|t)^(o|o)=o 

pixel (or region of pixels) are indicative of skin tones. z(k+i|k)=H(k+i)x(k+i|k) 

As shown in FIG. 5, the facial region of the image is 

generally indicated by the primary grouping of l's. The ^ ne Kalman corrector is: 

additional l's scattered throughout the binary image 14 do 10 x (k + i|k)-x(k + i|k)+k(k+i)Az(k+i|k) 
not indicate a facial feature, and are generally referred to as 

noise. In addition, the facial region also includes some O's, Az(k+iik>z(k+i)-z(k+i|k) 

generally referred to as noise. where fa the KalmaQ gain ^ Kalman in {s 

The present inventor came to the realization that the two computed as: 

dimensional binary image 14 of skin tones may further be is 

decomposed into a pair of one dimensional models using a K(k+i)=P(k+i[k)H r (k+iXH(k+i)P(k+i|k)H r (k+i)+R(k+i)]- 1 

facs locator 16. The reduction of the two dimensional ^ are coated as: 
representation to a pair of one dimensional representations 

reduces the computational requirements necessary to calcu- P(k+itk)-F(k)P(kjk)F r (k)+Q(k), p(o|o)-p 0 

la fu th ! loc u tion of , th u f ff 6 ; ? eferring x to FI 1 G * , 6 ' ^ e T an 20 pck.i^^i-^^HCk.DF^ip.) 

of the distribution of the l's (skm-tones) is calculated in both 

the x and y directions. The distribution is a histogram of the where Q(k)=E[w(k)W r (k)], R(k)=E[v(k)v T (k)] and P 0 =E[x 

number of 1 's in each direction. The mean may be calculated (0)x r (0)]. 

by /t=(l/N)Zx f . The approximate central location 38 of the In the presence of lighting fluctuation and image noise, 

face is determined by projecting the x-mean 30 and the 25 the tracked face image may be jittering. A nonlinear filtering 

y-mean 32 onto the binary image 14. The variance of the module therefore may be included in the tracking system to 

distribution in each of the x and y directions is also calcu- remove the undesirable jittering. A simple implementation 

lated. The variance may be calculated by o 2 =(l/N)2(x 1 - J a) 2 . of the nonlinear filtering module is to cancel any movement 

The variances 34<z-34d indicate the width of the facial of the tracked face which is smaller in magnitude than a 

feature in its respective directions. Projecting the variances 30 prescribed threshold and shorter in duration than another 

34a-34d onto the binary image 14 defines a rectangle prescribed threshold. 

around the facial region. The mean and variance are gener- A particular application suitable for the face detection and 
ally insensitive to variations for random distributions of tracking system described herein involves a video phone, 
noise. In other words, the mean and variance are robust for Other suitable device may likewise be used. An image of the 
which such additional l's and O's are not statistically 35 background without a person present is obtained by the 
important. Under different lighting conditions for the same system. Thereafter images are obtained in the presence of 
person and for different persons, the mean and variance the person. Each image obtained is compared against the 
technique defines the facial region. Moreover, the mean and background image to distinguish the foreground portion of 
variance are techniques merely requiring the summation of the image from the background image previously obtained, 
values which is computationally efficient. 40 The recipient's video phone has a nice background image 
The system may alternatively use other suitable statistical displayed thereon. The foreground, which is presumably the 
techniques on the binary image 14 in the x and y direction person, is transmitted to and overlayed on the nice back- 
to determine a location indicative of the central portion of ground image of the recipient's video phone on a frame - 
the facial feature and/or its size, if desired. Also, a more by-frame manner. The location of the face is determined by 
complex calculation may be employed if the data has 45 the face tracking system to smooth out the movement of the 
weighted values. The system may also decompose the person and remove jitter. 

two-dimensional binary image into directions other than x Alternatively, the nice background image may be trans- 

and y. mitted to the recipient's video phone, and is preferably 

The face locator and tracker 16 provides the general transmitted only once per session. This provides the benefit 

location of the center of the face and its size. The output of 50 of disguising the actual background environment and poten- 

image processor 11 provides data to a communication mod- tially reducing the bandwidth requirements, 

ule 40 which may transmit or display the image in any The system may be expanded using the same teachings to 

suitable format. The face tracking system 6 may enhance the locate and track multiple faces within an image, 

bit rate for the portion of the image containing the face, as The terms and expressions which have been employed in 

suggested by Eleftheriadis. 55 the foregoing specification are used therein as terms of 

The estimated face location may also be used for tracking description and not of limitation, and there is no intention, 

the face between frames of a video. For simplicity the face in the use of such terms and expressions, of excluding 

motion may be modeled as a piece-wise constant two- equivalents of the features shown and described or portions 

dimensional translation within the image plane. A linear thereof, it being recognized that the scope of the invention 

Kalman filter may be used to predict and correct the esti- 60 is defined and limited only by the claims which follow, 

mation of the two-dimensional translation velocity vector. What is claimed is: 

The estimated (filtered) velocity may then also be used to 1. Amethod of detecting a skin-tone image comprising the 

determine the tracked positions of faces. steps of: 

The preferred system model for tracking the motion is: (a) receiving said image including a plurality of pixels, 

x(k+i)-F(k)x(kWk) 65 where a P luralit y of ™ id of said image is 

represented by respective groups of at least three val- 

z(k+i)=H(k+i)x(k+i)+v(k+i) ues; 
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(b) filtering said image by transforming a plurality of said 
respective groups of said at least three values, where 
said respective groups of said less than three values has 
less dependency on brightness than said respective 
groups of said at least three values; 

(c) determining regions of aid image representative of 
skin-tones based on said filtering of step (b); 

(d) calculating a first distribution of said regions of said 
image representative of said skin-tones in a first direc- 
tion; 

(e) calculating a second distribution of said regions of said 
image representative of said skin-tones in a second 
direction, where said first direction and said second 
direction are different, the alignment of said first direc- 
tion does not change based on the color distribution of 
said image, and the alignment of said second direction 
does not change based on the color distribution of said 
image; and 

(f) locating said skin-tone region within said image based 
on said first distribution and said second distribution. 

2. The method of claim 1 where said image includes from 
a video containing multiple images. 

3. The method of claim 1 where said image includes a 
human face. 

4. The method of claim 1 where said at least three values 
includes a red value, a green value, and a blue value. 

5. The method of claim 4 where said respective groups of 
less than three values includes, a r value defined by said red 
value divided by the summation of said red value, said green 
value, and said blue value, and a g value defined by said 
green value divided by the summation of said red value, said 
green value, and said blue value. 

6. The method of claim 1 wherein at least one of said 
regions is an individual pixel of said image. 

7. The method of claim 1 wherein said determining of step 
(c) is based on a polygonal shape. 

8. The method of claim 1 wherein said determining of step 
(c) is based on a circle. 

9. The method of claim 1 wherein at least one of said first 
distribution and said second distribution is a histogram. 

10. The method of claim 1 wherein said first distribution 
is in a x-direction. 

11. The method of claim 10 wherein said second distri- 
bution is in a y-direction. 

12. The method of claim 11 wherein said first distribution 
and said second distribution are in orthogonal directions. 

13. The method of claim 1 wherein said first distribution 
and said second distribution are independent of each other. 

14. The method of claim 1 further comprising the steps of: 

(a) calculating a first generally central location of said first 
distribution; 

(b) calculating a first generally central location of said 
second distribution; and 

(c) locating said skin-tone region based on said first 
generally central location of said first distribution and 
said first generally central location of said second 
distribution. 

15. The method of claim 14 wherein at least one of said 
first generally central location of said first distribution and 
said first generally central location of said second distribu- 
tion is a mean. 

16. The method of claim 14 wherein the size of said 
skin-tone region is based on the variance of said first 
distribution and the variance of said second distribution. 

17. The method of claim 1 wherein said skin-tone region 
is tracked between subsequent frames. 
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18. The method of claim 17 wherein jitter movement of 
said skin-tone region is reduced between said subsequent 
frames. 

19. A method of detecting a skin-tone region within an 
image comprising the steps of: 

(a) receiving said image including a plurality of pixels, 
where a plurality of said pixels of said image is 
represented by respective groups of at least three val- 
ues; 

(b) filtering said image by transforming a plurality of said 
respective groups of said at least three values in said 
first color space to respective groups of less than three 
values in a second color space where said respective 
groups of said less than three values has less depen- 
dency on brightness than said respective groups of said 
at least three values; 

(c) determining regions of said image representative of 
skin-tones based on said filtering of step (b); 

(d) calculating a first distribution of said regions of said 
image representative of said skin-tones in a first direc- 
tion in said first color space; 

(e) calculating a second distribution of said regions of said 
image representative of said skin-tones in a second 
direction in said first color space, where said first 
direction and second direction are different; and 

(f) locating said skin-tones region within said image based 
on said first distribution and said second distribution. 

20. The method of claim 19 where said image includes 
from a video containing multiple images. 

21. The method of claim 19 where said image includes a 
human face. 

22. The method of claim 19 where said first and second 
direction are independent of the color distribution of said 
image. 

23. The method of claim 19 where said first and second 
directions are temporarily the same. 

24. The method of claim 19 wherein at least one of said 
regions is an individual pixel of said image. 

25. The method of claim 19 wherein said determining of 
step (c) is based on a polygonal shape. 

26. The method of claim 19 wherein said determining of 
step (c) is based on a circle. 

27. The method of claim 19 wherein at least one of said 
first distribution and said second distribution is a histogram. 

28. The method of claim 19 wherein said first distribution 
is in a x-direction. 

29. The method of claim 28 wherein said second distri- 
bution is in a y-direction, 

30. The method of claim 29 wherein said first distribution 
and said second distribution are in orthogonal directions. 

31. The method of claim 19 wherein said first distribution 
and said second distribution are independent of each other. 

32. The method of claim 19 further comprising the steps 
of: 

(a) calculating a first generally central location of said first 
distribution; 

(b) calculating a first generally central location of said 
second distribution; and 

(c) locating said skin-tone region based on said first 
generally central location of said first distribution and 
said first generally central location of said second 
distribution. 

33. The method of claim 32 wherein at least one of said 
first generally central location of said first distribution and 
said first generally central location of said second distribu- 
tion is a mean. 
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34. The method of claim 32 wherein the size of said 
skin-tone region is based on the variance of said first 
distribution and the variance of said second distribution. 

35. The method of claim 19 wherein said skin-tone region 
is tracked between subsequent frames. 

36. The method of claim 35 wherein jitter movement of 
said skin-tone region is reduced between said subsequent 
frames. 

37. A method of detecting a skin-tone region within an 
image comprising the steps of: 

(a) receiving said image including a plurality of pixels of 
said image is represented by respective groups of at last 
three values; 

(b) filtering said image by transforming a plurality of said 
respective groups of said at least three values to respec- 
tive groups of less than three values, where said respec- 
tive groups of said less than three values has less 
dependency on brightness than said respective groups 
of said at least three values; 

(c) determining regions of said image representative of 
skin-tones based on said filtering of step (b) and based 
on at least one of a polygonal shape and a circle; 

(d) calculating a first distribution of said regions of said 
image representative of said skin -tones in a first direc- 
tion; 

(e) calculating a second distribution of said regions of said 
image representative of said skin-tones in a second 
direction, where said first direction and said second 
direction are different; and 

(f) locating said skin-tone region within said image based 
on said first distribution and said second distribution. 

38. The method of claim 37 where said image includes 
from a video containing multiple images. 

39. The method of claim 37 where said image includes a 
human face. 

40. The method of claim 37 where said at least three 
values includes a red value, a green value, and a blue value. 

41. The method of claim 40 where said respective groups 
of less than three values includes, a r value defined by said 
red value divided by the summation of said red value, said 
green value, and said blue value, and a g value defined by 
said green value divided by the summation of said red value, 
said green value, and said blue value. 

42. The method of claim 37 wherein at least one of said 
regions is an individual pixel of said image. 

43. The method of claim 37 wherein said determining of 
step (c) is based on a polygonal shape. 

44. The method of claim 37 wherein said determining of 
step (c) is based on a circle. , 

45. The method of claim 37 wherein at least one of said 
first distribution and said second distribution is a histogram. 

46. The method of claim 37 wherein said first distribution 
is in a x-direction. 

47. The method of claim 46 wherein said second distri- 
bution is in a y-direction. 

48. The method of claim 47 wherein said first distribution 
and said second distribution are in orthogonal directions. 

49. The method of claim 37 wherein said first distribution 
and said second distribution are independent of each other. 

50. The method of claim 37 further comprising the steps 

of: 

(a) calculating a first generally central location of said first 
distribution; 

(b) calculating a first generally central location of said 
second distribution; and 

(c) locating said skin-tone region based on said first 
generally central location of said first distribution and 
said first generally central location of said second 
distribution. 
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51. The method of claim 50 wherein at least one of said 
first generally central location of said first distribution and 
said first generally central location of said second distribu- 
tion is a mean. 

52. The method of claim 50 wherein the size of said 
skin-tone region is based on the variance of said first 
distribution and the variance of said second distribution. 

53. The method of claim 51 wherein said skin- tone region 
is tracked between subsequent frames. 

54. The method of claim 53 wherein jitter movement of 
said skin-tone region is reduced between said subsequent 
frames. 

55. A method of detecting a skin -tone region within an 
image comprising the steps of: 

(a) receiving said image in an image color space including 
a plurality of pixels, where a plurality of said pixels of 
said image is represented by respective groups of at 
least three values; 

(b) filtering said image by transforming a plurality of said 
respective groups of said at least three values to respec- 
tive groups of less than three values in a transformed 
color space, where said respective groups of said less 
than three values has less dependency on brightness 
than said respective groups of said at least three values; 

(c) determining pixels in said image color space repre- 
sentative of skin-tones based on said filtering of step 

(b); 

(d) calculating a first distribution of said pixels of said 
image representative of said skin-tones in a first 
direction, 

(e) calculating a second distribution of said pixels of said 
image representative of said skin-tones in a second 
direction, where said first direction and said second 
direction are different and said first direction and said 
second direction are the same; and 

(f) locating said skin-tone region within said image color 
space based on said first distribution and said second 
distribution. 

56. The method of claim 55 where said image includes 
from a video containing multiple images. 

57. The method of claim 54 where said image includes a 
human face, 

58. The method of claim 55 where said at least three 
values includes a red value, a green value, and a blue value. 

59. The method of claim 58 where said respective groups 
of less than three values includes, a r value defined by said 
red value divided by the summation of said red value, said 
green value divided by the summation of said red value, said 
green value, and said blue value. 

60. The method of claim 55 wherein at least one of said 
regions is an individual pixel of said image. 

61. The method of claim 55 wherein said determining of 
step (c) is based on a polygonal shape. 

62. The method of claim 55 wherein said determining of 
step (c) is based on a circle. 

63. The method of claim 55 wherein at least one of said 
first distribution and said second distribution is a histogram. 

64. The method of claim 55 wherein said first distribution 
is in a x-direction. 

65. The method of claim 64 wherein said second distri- 
bution is in a y-direction. 

66. The method of claim 65 wherein said first distribution 
and said second distribution are in orthogonal directions. 

67. The method of claim 55 wherein said first distribution 
and said second distribution are independent of each other. 

68. The method of claim 55 further comprising the steps 

of: 

(a) calculating a first generally central location of said first 
distribution; 



04/30/2003, EAST version: 1.03.0002 



US 6,332,033 Bl 



11 



(b) calculating a first generally central location of said 
second distribution; and 

(c) locating said skin-tone region based on said first 
generally central location of said first distribution and 
said first generally central location of said second 
distribution. 

69. The method of claim 68 wherein at least one of said 
first generally central location of said first distribution and 
said first generally central location of said second distribu- 
tion is a mean. 
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70. The method of claim 68 wherein the size of said 
skin-tone region is based on the variance of said first 
distribution and the variance of said second distribution. 

71. The method of claim 55 wherein said skin-tone region 
is tracked between subsequent frames. 

72. The method of claim 71 wherein jitter movement of 
said skin-tone region is reduced between said subsequent 
frames. 
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